11 research outputs found

    Characterizing web pornography consumption from passive measurements

    Get PDF
    Web pornography represents a large fraction of the Internet traffic, with thousands of websites and millions of users. Studying web pornography consumption allows understanding human behaviors and it is crucial for medical and psychological research. However, given the lack of public data, these works typically build on surveys, limited by different factors, e.g. unreliable answers that volunteers may (involuntarily) provide. In this work, we collect anonymized accesses to pornography websites using HTTP-level passive traces. Our dataset includes about 15 00015\,000 broadband subscribers over a period of 3 years. We use it to provide quantitative information about the interactions of users with pornographic websites, focusing on time and frequency of use, habits, and trends. We distribute our anonymized dataset to the community to ease reproducibility and allow further studies.Comment: Passive and Active Measurements Conference 2019 (PAM 2019). 14 pages, 7 figure

    Characterizing Web Pornography Consumption from Passive Measurements

    Get PDF
    Web pornography represents a large fraction of the Internet traffic, with thousands of websites and millions of users. Studying web pornography consumption allows understanding human behaviors and it is crucial for medical and psychological research. However, given the lack of public data, these works typically build on surveys, limited by different factors, \eg unreliable answers that volunteers may (involuntarily) provide. In this work, we collect anonymized accesses to pornography websites using HTTP-level passive traces. Our dataset includes about 15,000 broadband subscribers over a period of 3 years. We use it to provide quantitative information about the interactions of users with pornographic websites, focusing on time and frequency of use, habits, and trends. We distribute our anonymized dataset to the community to ease reproducibility and allow further studies

    A Workload Characterization Methodology for WWW Applications

    No full text
    With the World Wide Web (WWW) traffic being the fastest growing portion of load on the internet, describing and characterizing this workload is a central issue for any performance evaluation study. In this paper, we present an approach for generating a profile of requests submitted to a WWW server (GET, POST, ...) which takes explicitly into account the user behavior when surfing the WWW (i.e. navigating through it via a WWW browser). We present Probabilistic Attributed Context Free Grammar (PACFG) as a model for translating from this user oriented view of the workload (namely the conversations made within browser windows) to the methods submitted to the Web servers (respectively to a proxy server). The characterization at this lower level are essential for estimating the traffic on the net and are thus the starting point for evaluations of net traffic

    Learning Web Request Patterns

    No full text
    Summary. Most requests on the Web are made on behalf of human users, and like other human-computer interactions, the actions of the user can be characterized by identifiable regularities. Much of these patterns of activity, both within a user, and between users, can be identified and exploited by intelligent mechanisms for learning Web request patterns. Our focus is on Markov-based probabilistic techniques, both for their predictive power and their popularity in Web modeling and other domains. Although history-based mechanisms can provide strong performance in predicting future requests, performance can be improved by including predictions from additional sources. In this chapter we review the common approaches to learning and predicting Web request patterns. We provide a consistent description of various algorithms (often independently proposed), and compare performance of those techniques on the same data sets. We also discuss concerns for accurate and realistic evaluation of these techniques.

    MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data

    Full text link
    Sequential traces of user data are frequently observed online and offline, e.g., as sequences of visited websites or as sequences of locations captured by GPS. However, understanding factors explaining the production of sequence data is a challenging task, especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of browsing a website, or movement behavior may vary between groups of users. In this work, we tackle this task and propose MixedTrails, a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis is derived from existing literature, theory or intuition and represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions. For example, when trying to understand human movement in a city and given some observed data, a hypothesis assuming tourists to be more likely to move towards points of interests than locals, can be shown to be more plausible than a hypothesis assuming the opposite. Our approach incorporates such hypotheses as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors, give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.Comment: Published in Data Mining and Knowledge Discovery (2017) and presented at ECML PKDD 201
    corecore